Employee Sentiment (GlassDoor) & Returns in the Oil Sector
Preface
Employees are the heart and soul of a business. Their actions and performance are what drive the profitability of a business and it’s ability to maximize shareholder value. In the following document, we look to address the following question:
Is there a relationship between employee sentiment and excess returns?
By building a webscraping program and applying it to GlassDoor, we were able to compile a comprehensive list of employee reviews across a variety of companies:
- ExxonMobil (XOM)
- Chevron (CVX)
- Shell (SHEL)
- BP (BP)
For context on the analysis that occurs below, here is a chart that represents their respective share prices from the last 10 years:
We wanted to select companies that all operate in the same industry and are exposed to similar factors that effect movements. This ensures that our analysis truly represents differentiation in relation to excess returns.
Here is a chart showing how the review count changes through time, on a monthly basis. This is an important metric in how we selected our time frame for our analysis.
Note the small number of reviews during GlassDoors early years. For this reason, we filtered the data to be post 2013.
Firstly, we needed to make sense of all the reviews and how best to compile and analyze them.
NLP
Natural Language processing is the computational analysis of language and speech. This essentially analyzes words and phrases and rates them numerically or categorically based off of the emotion and message they are conveying. Here is a short description of the two methods we applied:
- Bing: A dataset of 6,786 words with binary positive and negative sentiment scores
| word | sentiment |
|---|---|
| 2-faces | negative |
| abnormal | negative |
| abolish | negative |
| zest | positive |
| zippy | positive |
| zombie | negative |
- Afinn: A dataset of 2477 words with scores ranging from -5 to 5:
| word | value |
|---|---|
| breathtaking | 5 |
| superb | 5 |
| thrilled | 5 |
| amazing | 4 |
| awesome | 4 |
| brilliant | 4 |
Just for fun, here is a wordcloud of different phrases:
Overwhelmingly positive. Need to think about the people leaving negative reviews after lashing out emotionally, then going back and deleting them out of guilt (or to save face as a lot of reviews contain their roles).
Even more fun, lets look at the negative ones:
This is just the high frequency occurances. If you scroll through a table of the lowest rated sentiment words, it gets quite entertaining:
| date | Company | word | value |
|---|---|---|---|
| 2023-11-30 | bp | worse | -3 |
| 2023-11-29 | bp | losing | -3 |
| 2023-12-11 | bp | bad | -3 |
| 2023-07-24 | bp | lost | -3 |
| 2023-05-04 | bp | selfish | -3 |
| 2022-12-23 | bp | terrible | -3 |
| 2022-07-26 | bp | awful | -3 |
| 2021-08-11 | bp | worst | -3 |
| 2018-03-22 | bp | horrible | -3 |
| 2009-03-30 | bp | ugly | -3 |
| 2011-07-08 | bp | guilty | -3 |
| 2021-03-29 | bp | hate | -3 |
| 2015-04-19 | bp | desperate | -3 |
| 2014-01-15 | bp | hell | -4 |
| 2023-10-02 | bp | dead | -3 |
| 2023-07-15 | bp | badly | -3 |
| 2023-08-11 | bp | destroy | -3 |
| 2023-09-13 | bp | sucks | -3 |
| 2023-08-02 | bp | ridiculous | -3 |
| 2022-04-27 | bp | boring | -3 |
| 2021-11-27 | bp | fake | -3 |
| 2021-07-06 | bp | victim | -3 |
| 2016-07-13 | bp | crisis | -3 |
| 2010-08-20 | bp | die | -3 |
| 2016-01-19 | bp | destruction | -3 |
| 2015-05-13 | bp | dreadful | -3 |
| 2017-05-26 | bp | misleading | -3 |
| 2021-03-08 | bp | woeful | -3 |
| 2023-07-27 | Shell | suck | -3 |
| 2015-11-17 | Shell | loss | -3 |
| 2021-02-27 | Shell | racist | -3 |
| 2023-07-27 | Shell | cheated | -3 |
| 2023-12-21 | Shell | evil | -3 |
| 2023-03-02 | Shell | miserable | -3 |
| 2023-07-24 | Shell | greenwashing | -3 |
| 2021-08-17 | Shell | warning | -3 |
| 2022-09-17 | Shell | dumb | -3 |
| 2022-02-21 | Shell | slavery | -3 |
| 2013-11-29 | Shell | apathy | -3 |
| 2021-01-05 | Shell | destroying | -3 |
| 2008-09-12 | Shell | mediocrity | -3 |
| 2020-04-22 | Shell | nuts | -3 |
| 2018-05-29 | Shell | abusive | -3 |
| 2021-03-10 | Shell | crap | -3 |
| 2021-01-11 | ExxonMobil | rants | -3 |
| 2023-11-02 | ExxonMobil | kills | -3 |
| 2023-02-05 | ExxonMobil | apathetic | -3 |
| 2023-07-19 | ExxonMobil | loose | -3 |
| 2022-10-25 | ExxonMobil | destroys | -3 |
| 2022-04-05 | ExxonMobil | destroyed | -3 |
| 2022-07-08 | ExxonMobil | dire | -3 |
| 2012-07-17 | ExxonMobil | kill | -3 |
| 2020-10-25 | ExxonMobil | humiliation | -3 |
| 2013-10-04 | ExxonMobil | killing | -3 |
| 2015-05-03 | ExxonMobil | cheat | -3 |
| 2020-10-26 | ExxonMobil | illegal | -3 |
| 2022-03-27 | Chevron | fatalities | -3 |
| 2022-03-19 | Chevron | racism | -3 |
| 2021-08-31 | Chevron | damn | -4 |
| 2008-10-04 | Chevron | wtf | -4 |
| 2019-06-12 | Chevron | worried | -3 |
We also utilized the star ratings that employees leave whenever a review is posted (ranging from 1 star to 5 stars). To organize our data into workable metrics, we used individual words instead of ngrams (collection of words or short sentences) as the bulk of the reviews are short and sometimes contain only one word. To summarize, we opted for the following:
- Grouped reviews by year and averaged.
- Bing was organized into percentage of reviews that were deemed positive.
- Afinn was organized into average total score.
- Star ratings were averaged.
Here are some visuals that are separated by company, depicting how these various metrics change through time. All charts use data aggregated by month to help tell the story:
Alpha Analysis
- Using CAPM, Alphas were calculated annually for each individual stock
- 1 Year Beta
- S&P500 utilized as a proxy for market return
- 10 Year U.S. Treasury security utilized as a proxy for the risk free rate
- Annual Alphas were regressed annually against the prior year’s average review ratings and written sentiment, as defined by the Affin dictionary
- Bing NLP excluded due to model multicollinearity
Rational
- Working with alpha instead of logarithmic returns captures returns that are not related to market-wide price movements
- Attempts to isolate returns to firm specific factors such as governance or commodity price exposure
- Utilize sentiment indicators generated from the prior years review as a proxy to capture possible alpha derived from strong employee satisfaction
- One year time frame captures returns and sentiment on a longer term horizon
- Attempt to better capture governance effects
Findings
Our sentiment proxy had no explanatory power when attempting to predict alpha returns in the proceeding year
Model fit was poor for all four companies
No model or factor significance was found
Counter-intuitive beta directionality
| Characteristic | Chevron | BP | ExxonMobil | Shell | ||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|
| Beta | 95% CI1 | p-value | Beta | 95% CI1 | p-value | Beta | 95% CI1 | p-value | Beta | 95% CI1 | p-value | |
| mean_rating | -2.3 | -4.6, -0.10 | 0.043 | 0.91 | -1.2, 3.0 | 0.3 | -1.7 | -4.4, 0.94 | 0.2 | -1.2 | -4.1, 1.6 | 0.3 |
| sentiment | 1.5 | 0.14, 2.8 | 0.036 | -0.23 | -2.0, 1.6 | 0.8 | 0.46 | -1.9, 2.8 | 0.7 | 1.5 | -1.5, 4.5 | 0.3 |
| 1 CI = Confidence Interval | ||||||||||||
Portfolio-based Analysis
Two portfolios of our selected companies were created:
- Portflio 1: Naive equally-weighted portfolio
- Rebalanced yearly based upon the prior years closing price
- Portfolio 2: Optimized portfolio utilizing three sentiment indicators
- Generated weights to utilize a combination the input indicators to create a single sentiment measure
- Optimization process to determine the ideal relative sentiment mix
- Training window: 2014-2021
- Rebalanced annually using an aggregation sentiment data from the prior year
- Sentiment is treated on a absolute basis as opposed to on a relative basis to prior years
- Companies are proportional de-weighted in cases of ‘review slippage’ relative to their peers
- No additionally compounds are included for +/- review drift
Rational
- Firms with consistently superior Glassdoor reviews, as measured by our sentiment indicators, will be receive more weight
- Shell maintained relatively superior reviews over the analysis window
- Attempts to capture excess returns generated by superior employee sentiment
- Naive equally weighted portfolio intends to serve as a baseline
- Captures returns on longer term horizon
Findings
Utilizing our sentiment indicators, we were not able to generate a portfolio that outperformed our baseline during the test period
Assigning 72% weight to our Bing indicator and 28% weight our Afinn indicator (both NLP) yielded returns in the 99th percentile with variance in the 90th percentile (training)
Test window (2020-2023) suggests superior returns observed in the training window were likely spurious
| Return | Variance | Positive Prop. | Rating | Sentiment | Portfolio |
|---|---|---|---|---|---|
| 0.4307042 | 0.0706839 | 0.72 | 0 | 0.28 | Optimized Sentiment-based |
| 0.4282957 | 0.0738970 | NA | NA | NA | Naive |
Review/Next Steps
The way in which we have captured sentiment and bound our review data to return data (i.e by year) has very little utility in terms of creating some sort of predicative model. It is clear that external firm specific factors have a much larger impact, such as commodity prices. Still, we beilieve that high frequency GlassDoor reviews when combined with NLP have the potential to capture employee sentiment to proxy against excess returns.
- Dynamics of sentiment trend correspond with the economic environment and contain characteristics that point to the fact that they are more than white noise.
- Further analysis would include a much broader subset of companies, ideally with a higher number of total reviews.
- We were constrained by the run time of our webscraping program.
- We would like to explore with different sectors
- There may be a bigger impact in service-based companies where revenues are less dependent on PPE and asset rights
- Explore various machine learning techniques to see if there is model improvement.